Decomposition for ISO/IEC 10646 Ideographic Characters
نویسندگان
چکیده
Ideograph characters are often formed by some smaller functional units, which we call character components. These character components can be ideograph radicals, ideographs proper, or some pure components which must be used with others to form characters. Decomposition of ideographs can be used in many applications. It is particularly important in the study of Chinese character formation, phonetics and semantics. However, the way a character is decomposed depends on the definition of components as well as the decomposition rules. The 12 Ideographic Description Characters (IDCs) introduced in ISO 10646 are designed to describe characters using components. The Hong Kong SAR Government recently published two sets of glyph standards for ISO10646 characters. The standards, being the first of its kind, make use of character decomposition to specify a character glyph using its components. In this paper, we will first introduce the IDCs and how they can be used with components to describe two dimensional ideograph characters in a linear fashion. Next we will briefly discuss the basic references and character decomposition rules. We will then describe the data structure and algorithms to decompose Chinese characters into components and, vice versa. We have also implemented our database and algorithms as an internet application, called the Chinese Character Search System, available at website http://www.iso10646hk.net/. With this tool, people can easily search characters and components in ISO 10646.
منابع مشابه
Alis Technologies UTF - 16 , an encoding of ISO 10646
The Unicode Standard [UNICODE], and ISO/IEC 10646 [ISO-10646] jointly define a coded character set (CCS), hereafter referred to as Unicode, which encompasses most of the world’s writing systems [WORKSHOP]. UTF-16, the object of this specification, is a way to encode Unicode characters that has the characteristics of encoding the vast majority of currently-defined characters in exactly two octet...
متن کاملProposal to Encode the Zanabazar Square Script in ISO/IEC 10646
• N3956 L2/10-411 “Preliminary Proposal to Encode the Xawtaa Dorboljin Script in ISO/IEC 10646” • N4041 L2/11-162 “Preliminary Proposal to Encode the Mongolian Square Script in ISO/IEC 10646” • N4160 L2/11-379 “Revised Preliminary Proposal to Encode the Mongolian Square Script” • N4413 L2/13-068 “Proposal to Encode the Mongolian Square Script in ISO/IEC 10646” • N4471 L2/13-198 “Revised Proposa...
متن کاملInternet Mail Consortium
The Unicode Standard [UNICODE], and ISO/IEC 10646 [ISO-10646] jointly define a character set (hereafter referred to as Unicode) which encompasses most of the world’s writing systems. UTF-16, the object of this specification, is an encoding scheme of this character set that has the characteristics of encoding the vast majority of currently-defined characters in exactly two octets and of being ab...
متن کاملHangul Jamo and Hangul Precomposed Syllable Description a Response to Iso/iec Jtc1/sc2/wg2 N3095 (l2/06-286) and Iso/iec Jtc1/sc2/wg2 N3113 (l2/06-289)
The assumption in documents N3095 and N3113 seems to be that a proposed alignment of the text in clause 26.1 of 10646 with Unicode would be the reflection of a new misunderstanding. However, the Unicode specification, including the canonical decomposition of Hangul precomposed syllable characters have been fixed for many years now. Indeed, the decompositions and equivalences are now stabilised ...
متن کاملUnicodeTM: What is it and how do I use it?
Tony Graham Senior Consultant Mulberry Technologies, Inc. 17 West Jefferson Street, Suite 207 Rockville, MD 20850 U.S.A. phone: 301/315-9631 email: [email protected] http://www.mulberrytech.com Abstract: The rationale for Unicode and its design goals and detailed design principles are presented. The correspondence between Unicode and ISO/IEC 10646 is discussed, the scripts included or plann...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002